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QTL MAPPING IN PLANT BREEDING POPULATIONS 

5 C310SS-REFERENCES TO RELATED APPLICATIONS 

This Patent Application is related to U.S. Provisional Patent Application 
Nos. 60/068,822, filed December 22, 1997 and 60/084,048, filed May 4, 1998. Both of 
these priority documents are incorporated by reference in their entirety. 

10 STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER 

FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT 

Not applicable. 



BACKGROUND OF THE EWENTTON 

1 S Historically, flie tenn ^'quantitative traif* has been used to describe 

variability in expression of a phenotypic trait fhst shows continuous variability and is the 
net result of multiple genetic loci possibly interacting with each oflier and/or with the 
environment To desoibe a broader phenomenon, the term "complex trait** has been used 
to describe any trait that does not exhibit classic Mendelian inheritance attributable to a 

20 single genetic locus (Lander & Schoric, Science 265:2037 (1994)). The distinction 

between the terms, for purposes of this disclosure, is subtle and therefore the two terms 
will be used synonymously. 

It is estimated that 98% of the economically important phenotypic traits in 
domesticated plants are quantitative traits. These traits are classified as oligogenic or 

25 polygenic based on the perceived numbers and magnitudes of segregating genetic factors 
affecting the variability in expression of the phenotypic trait 

The development of ubiquitous polymorphic genetic markers that span the 
genome (e.g,, RFLP) has made it possible for quantitative and molecular geneticists to 
investigate what Edwards, et al, in Genetics 115:1 13 (1987) referred to as quantitative 

30 trait loci (QTL), as well as their numbers, magnitudes and distributions. (^TL mclude 
genes that control, to some degree, numerically represratable phenotypic traits that are 
usually continuously distributed within a fimily of individuals as well as within a 
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population of families of individuals. An experimmtal paradigm has been developed to 
identify and analyze QTL. This paradigm involves crossing two inbred lines, genotyping 

. multiple marker loci and evaluating one to several quantitative phenotypic traits among 

the segregating progray derived fiom the cross. The QTL are then idratified on the basis 
5 of significant statistical associations between the genotypic values and the phenotypic 
variability among the segregating piogCTy. This e3q>erimental paradigm is ideal in that 
the parental lines of the F] geno^on have the same degree of linkage, all of the 
associations between the genotype and phenotype in the progeny are informative and 
linkage disequilibrium between the genetic loci and phenotypic traits is maximized. 

10 Because relatively few numbers of progeny are studied, the experiments 

described above lack the necessary statistical powCT to identify QTL for most traits of 
economic importance in breeding populations, for example, maize, sorghum, soybean, 
canola, etc. Additionally, the lack of statistical power produces biased estimates of tiie 
QTL that are identified Additional imprecision is introduced in extrapolating die 

IS identification of QTL to the progeny of gen^cally different parents within a breeding 
population. 

General forms of genetic and statistical models for predicting breeding 
values are known in the art ^Henderson, Biometrics 31 :423 (1975)). Specific models 
have also been proposed for QTL idwtification in animal breeding (Soller & Genizi, 

20 Biometrics 34:47 (1978); and Fernando & Grossman, Genet. Sel EvoL 21:467 (1989)) 
and human populations (Goldgar,^m. J. Hum, Genet 47:957 (1990)). However, 
statistical models have not been developed for plant breeding populations. Thus^diere 
remains aneed m the art for methods that take account of and are q[yplicable to 
determining QTI^ in commerciaUy important plant breeding p(q[>ulations^ Theinvoition 

25 herein satisfies fliis need. 



SUMMARY OF THE INVENTION 
This invention provides methods of identifying quantitative trait loci in a 
mixed defined plant population comprising multiple plant fiunilies. The method operates 
30 by quantifying a phmotypic trait across lines sampled fix>m fiie population, idmtifying at 
least one genetic marker associated with the phenotypic trait by screening a set of markers 
and identifymg flie quantitative trait loci based on tiie association of the phenotypic trait 
and the genetic inarker(s). 
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In one embodiment, the plant population consists of diploid plants, either 
hybrid or inbred, preferably maize, soybean, sorghum, wheat, sunflower, and canola. In a 

most preferred OTibodiment, the plant population consists of Zea may s. 

The phenotypic traits associated with the QTL are quantitative, meaning 
5 that, in some context, a numerical value can be ascribed to the trait. Preferred phenotypic 
traits include, but are not limited to, grain yield, grain moisture, grain oil, root lodging, 
stalk lodging, plant height, ear height, disease resistance, and insect resistance. 

In a prefOTed embodiment, the genetic markers associated with the QTL 
are restriction fragment length polymorphisms (RFLP), isozyme markers, allele specific 
10 hybridization (ASH), amplified variable sequences of the plant genome, self-sustained 
sequence replication, simple sequence repeats (SSR), and arbitrary fragment length 
polymorphisms (AFLP). In anoth^ preferred embodiment, at least two genetic markers 
are associated with the QTL and are identified by high throughput screening. 

The association of the graetic loci and the phenotypic trait is determined 
15 through specified statistical models. In a preferred embodiment, the statistical models are 
linear models with fixed effects and random effects. In a particularly preferred 
embodiment, the statistical model is a mixed effects model wherein the phenotypic trait of 
the progeny of one line from one family in the breeding population is evaluated in 
topcioss combination with a test^ parent 
20 In yet another embodiment, the identification of QTL allows for the 

marker assisted selection of a desired phenotypic trait in the progeny of a diploid plant 
breeding population selected 6om the group consisting of maize, soybean, sorgjium, 
wheat, sunflower, and canola. In a particularly preferred embodiment, the plant 
population consists of Zea rm^. In yet another mibodiment, the phenotypic trait 
25 selected for includes, but is not limited to, yield, grain moisture, grain oil, root lodging, 
stalk lodging, plant height, ear height, disease resistance, and msect resistance. 

In another aspect of ttie invotition, plants selected by the methods 
described above are provided. In addition to plants created by selfing and sexual crosses, 
cloned plants are described, as are transgenic plants. The transgoiic plants contain 
30 nucleic acid sequmces associated with a desired QTL. 



L 



DETAILED DESCRIPTION OF THE INVENTION 
OVERVIEW 
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Previously, quantitative tniit loci (QTL) have been identiiSed using a 
sample of segregating progeny derived firom a single cross of two inbred lines, Le., a 
biparental cross. The disadvantages of this method are that, for adequate statistical 
power, it requires a large commitment of field testing resources to be devoted to the 
5 progeny fit>m a single cross and inferences of associations betwem the genetic loci and 
phenotype caimot be extended beyond the specific sample set of progeny. Thus, the 
identification of the QTL in a maricer-aided selection development program for plant 
populations caimot be used with confidence. 

Moreover, because breeding populations undergo constant selection to 
10 improve yield and resistance to pathogens, it is impractical to monitor simultaneously all 
relevant breeding crosses. Thus, the effects of genetic background on particular QTL are 
difiScult to determine with conventional methods. 

The present invention overcomes the need for large nimibers of progeny of 
a single cross by using lines derived &om multiple breeding crosses and phenotypic 
15 information obtained througji hybrid topcrosses; technology familiar to the commercial 
plant breeder. Accordingly, the collection of phenotypic infoimation does not require 
resources beyond those already committed for ongoing plant breeding. 

The present invention overcomes the difEculties in inferring the results 
beyond the sample set of progeny tfirou^ the acquisition of data from progeny sampled 
20 from multiple breeding crosses and the use of statistical models which account for gmetic 
variability in different families of a breeding population. Thus inferences about QTL can 
be drawn across the entire breeding population. This makes it possible to predict Ae 
effects of QTL alleles on phenotypic traits in multiple genetic backgrounds. 

Tbe models of the present invention are developed using statistical 
25 methods that are relevant to the structure of plant breeding pdpulations. The models are 
iinplemented using computing and data management sofiware. Simulations are 
developed to validate the statistical models. Tbe statistical methods are flien aH)lied to 
genotypic and phenotypic data collected across plant breeding population to identify and 
map QTL within the genomes of the plants in those populations. 



30 



n. DEFiNrnoNS 

Unless defined otherwise, all technical and sciratific terms used herdn 
have the meaning commonly undostood by a person skilled in the art to which this 
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invention belongs. The following rcfexences provide one of skill with a genial definition 
of many of the terms used in this invration: Singleton, et aL, DICTIONARY OF 
Microbiology and Molecular Biology (2d ed. 1994); The Cambridge Dictionary 
OF Science and Technology (Walker ed. 1988); The Q-ossary of Genetics, 5to Ed., 
5 RiegCT, IL, et fl/.(eds.). Springer Verlag (1991); and Hale & Marham, THE Harper 
Collins Dichonary of Biology (1991). Although any methods and matoials similar 
or equivalent to those described herein can be used in the practice or testing of the present 
invention, preferred methods and materials are described. As used herein, the following 
temis have the meanings ascribed to them unless specified otii^wise. 
10 The term ''association** or ''associated with** in the context of this invention 

refers to genetic marker loci and quantitative trait loci that are in disequilibrium, /.e., the 
marker genotypes and trait phenotypes are found together in the progeny of a plant or 
plants more often than if the maricer genotypes and trait phenotypes segregated 
separately. 

1 S The phrase "diploid plants'* refers to plants that have two sets of 

chromosomes, typically one from each parent 

The phrase "expression cassette** refers to a nucleic acid sequence to be 
introduced into a transgenic plant and contains the nucleic acid sequoice to be transcribed 
and a promoter to direct the transcription. The promoter can either be homologous, 

20 occurring naturally to direct the expression of the desired transgene or heterologous, Le,, 
occurring naturally to direct the expression of a nucleic acid derived fix>m a goie other 
than the desired transgene. Fusion genes with hetoologous promoter sequences are 
desirable, e.g., for regulating expression of encoded proteins. In some instances, the 
promoter may constitutively bind transcription &ctors and RNA Polymmse IL In other 

25 instances, a heterologous promoter inay be desirable because it has sequences that U 
transcription &ctors fhc naturally occurring promoter lacks. 

The phrase "genetic maikei'* refers to a nucleic add sequmce present in a 
plant gesDomc used to locate genetic loci that contain alleles which contribute to 
variability in expression of quantitative tndts. Goietic markers also refer to nucleic acid 

30 sequences complementary to the genomic sequences, such as nucleic acids used as 
probes. 

The phrase "high througlqiut screening** refers to assays in yMch the 
format allows large numbers of nucleic acid sequences to be screened for defined 
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characteristics. In tiie context of the instant invention, high throughput screening is of 
nucleic acid sequences of the plant genome to identify the presence of gmetic markers 
which co-segregate with expression of desirahle phenotypic traits. 

The phrase '"hybrid plants'* refers to plants which result fiom a cross 
S between genetically divergent individuals. 

The phrase 'Mnbred plants" refers to plants derived 6om a cross between 
genetically related plants. 

The term '"lines" in the context of this invention ref<^ to a family of 
related plants derived by self-pollinating an inbred plant 
10 The phrase 'linkage disequilibrium** refers to a non-random association of 

alleles from two or more loci. It implies that a groiq> of marker alleles or QTL alleles 
have been inherited together. 

The term "lodging" in the context of this invention refers to the tendency 
of plants to fall over prior to harvest 
15 The phrase "markor assisted selection" refers to selection of a plant by 

virtue of the presence or absence of one or more genetic marker alleles. In the context of 
this invention, the genetic markers have been previously associated with a QTL. 

The phrase '*mixed defined plant population" refers to a plant population 
containing many different families and lines of plants. Typically, the defined plant 
20 population exhibits a quantitative variability for a phenotype that is of interest 

The phrase "multiple plant &iniUes" refers to different ^milies of related 
plants within a population. 

The phrase "opa:ably linked" refers to a fimctional linkage between a 
promoter and a second sequence, wherein the promoter sequence wiitiat^ transcription of 
25 RNA corresponding to the second sequence. 

The phrase "phenotypic trait" refers to the £qypearance or other 
characteristic of a plant, resulting fiom the interaction of its genome with the 
environment 

The term '^progeny" refers to the descendants of a particular plant (self- 
30 cross) or pair of plants (cross-pollinated). The descradants can be, for exanq)le, of the Ft , 
the F2 or any subsequent genmtion. 

The term 'Y>n>moter^ refers to a nucleic acid sequence that directs 
expression of a coding sequence. A promoter can be constitutive, Le., relatively 
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independent of ttie stage of differentiation of the cell in which it is contained or it can be 
inducible, le., induced be specific environmental factors, such as the length of the day, 
the temperature, etc or a promoter can be tissue-specific, i.e., directing the expression of 
the coding sequence in cells of a certain tissue type. 
5 The phrase "quantified population phenotype** refers to a phenotypic trait 

present in a plant population that exhibits continuous variability and is the re^t of eiflier 
a genetic locus interacting with the environment or multiple genetic loci possibly 
interacting with each other or with the environment An example of a quantified 
population phenotype is plant height. TypicaUy in the plant population, the frequency 

1 0 distribution of a phenotypic trait exhibits a bell curve. 

The phrase "quantitative trait loci" refers to segregating genetic fectors 
which affect the variability in expression of a phenotypic trait. 

The phrase "sexually crossed" or "sexual reproduction" in the context of 
this invention refers to the fusion of gametes to produce seed by pollination. A "sexiial 

1 5 cross" is pollination of one plant by another. "Selfing" is the production of seed by self- 
pollinization, Le., pollen and ovule are from the same plant. 

The phrase "tester parent" refers to a parent that is unrelated to and 
genetically different from a set of lines to which it is crossed. The cross is for purposes of 
evaluating differences among the lines in topcross combination. Using a tester parent in a 

20 sexual cross allows one of skill to det^mine the association of phenotypic trait with 
repression of quantitative trait lod in a hybrid combinatiorL 

The phrases "topcross combination" and "hybrid combination" refer to the 
processes of cussing a single tester parent to multiple lines. The purposes of producing 
such crosses is to evaluate the ability of ttie lines to produce desirable phenotypes in 

25 hybrid progeny derived fipom the line by the tester cross. 

The phrase "transgenic planf refers to a plant into which exogenous 
polynucleotides have been introduced by any means other than sexual cross or selfing. 
Examples of means by which this can be acconq)lished are described below, and include' 
Ajpvbacierium-me^stcd transformation^ biolistic mettiods, electroporation, in planta 

30 tedmiques, and tiie like. Such a plant containing flie exogenous polynucleotides is 
referred to hsx^ as an Ri generation transg^c plant. Transgraic plants may also arise 
from sexual cross or by selfing of transgenic plants into which exogenous polynucleotides 
have beoi mtroduced. 
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m. Development of Genetic and Statistical Models for Identifying and Mapping 
QTL in Plant Breeding Populations 



5 After genetic markers have been identified, e.g., using RFUP or other 

methods discussed herein, the degree of association of the genetic markers to the 
quantitated phenotypic trait can be used to identify and map QTL. This is done through 
use of statistical models. 

10 A. Fixed Effects Model 

In a fixed effects model, members of one family or full siblings are used to 
detemiine the association between genetic markers and a phenotypic trait SoUer & 
Genizi first proposed fixed effects models for identifying QTL using full-sibling and half- 
sibling population structures (SoUer & Genizi, Biometrics 34:47 (1978)). Inferences 

1 5 about QTL effects and genomic sites derived from the association between the phenotypic 
trait and the genetic marker using this model are specific to the sample of lines and 
progeny used for the evaluation. These inferences cannot be extended to othCT &milies or 
progeny because the model does not view the genotypic and ph^otypic data as a 
representative sample fix>m a large population. The statistical model follows the form of 

20 Equation 1: 

Y^; = m + ft + CX^^ + gj^ 

Equation 1 

wherein Yg^; is the phenotype of allele q in fanuly i, 
25 m is the avoage of die phenotype in the breeding population, 

fi is the effect of fiunilyi; 

C is the combining ability ofdie QTL allele. C is unknown and is 
estimated as the difference in phenotype between homozygotes in the Ime per se from the 
line phenotype evaluated m topcrossed progeny (Beavis, W., ei aL, Crop Science 34:882 
30 (1994)). 

X^d) is an indicator variable taking on values of 1 or 0 for the alleles* 
presence or absence in the lines from family i\ and 
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B. Random Effects M del 

Because members of families arc often genetically related and represent 

only a sample of all possible breeding crosses within a population, a model which would 
take diis into account is needed 

A random efiTects model differs from the fixed effects model in that there 
are no estimated allele effects. Rather an estimate is made of the proportion of (^^ the 
phenotypic variability, that can be ascribed to the variability in alleles at flie QTL. Unlike 
Hie fixed effects model, it is possible to predict genotypic effects for sampled alleles at flie 
QTL in untested progaiy. Also, unlike tihe fixed effects model, predicted pbenotypes can 
be extended to other related families in the breeding population. Random effects models 
have been prepared for fiiU-sibling and half-sibling family structures in human pedigrees 
(Goldgar, Am, J. Hum. Genet. 47:957 (1990)) and to general outbred populations pCu & 
Atchley, Genetics 141:1 198 (1995)). The model follows Equation 2. 

Y(^ = m + C/y+,A,y ^ 

Equation 2 

wherein Yij isfliephenotypeoflinej in family /, % 

m is the average of the phenotype of the breeding population, 
20 dj is the combining ability of the QTL, linked to the marker locus, in 

liney of family i and is ~ N(0, 6^). 

A^ is the combining ability of all QTL, unlinked to in line y of 
fiimily U i^e., it is the sum of tiie polygenic background effects that are not ga[ietically 
linked to the qTTL and is "-N(0, 6^0- 
25 In this model, 

E(Yi^)=m, 

V(Yg) = 6\ + 6\ + 6\ = 6\ and 

Cov(Ytf,Y(^') = d,^6\+6i^6^a, 
wherein big is fiie proportion of alleles that have identity by descent (JBD) at the QTL 
30 betweenlinesy andy'of &mily I. 5^is the proportion ofalleles that are IBD at all 
remaining QTL between lines J and of &mily L 5^ is conditional on knowledge of 
pedigree relationships for linked maricer locus genotypes. 



10 
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C. Mixed Effects Model 

Random effects models do not allow for tester effects. Testers are selected 

inbred plant lines used to evaluate lines of a family tfarough hybrid (topcross) 

combination. Because testers are specifically selected, their effects on the phenotype of 
5 the progray are fixed. Therefore, the resulting model consists of mixed random and fixed 
effects and follows Equation 3. 

Y^jk = m + Tjfc + Cyk + Aijk + gijk 

Equation 3 

10 wherein Ygt isthephenotypic value of the progeny of line jfix)m family i 
evaluated in topcross combination with tester 

m is the average phenotype of the breeding population, 
Tk is the fixed effect of tester ky 

C(fk is the combining ability of the alleles, at the QTL linked to the 
15 marker loci, with tester A: and is ~N(0,6^), i 

Aigk is the combining ability of the alleles, at all QTL unlinked to the 
marker loci, with tester k. It is the sum of the polygCTic backgroimd effects in 
combmation with tester not "linked" to the QTL and is - N(0, 6^a), and ^ 

20 The same mferences from the random effects, Cijk and A^, are made as in 

die random effects modeL The mixed effects model is an adaptation of a model first 
proposed by Fernando & Grossman in GeneL Sel. EvoL 21:467 (1989) fi>r family 
structures in animal breeding populations and is usually used to describe heeds and 
management practices. 

25 In order to obtain estimates and predictions of effects in flie model, the 

mixed effect model. Equation 3 , is translated into incidence matrices as described in, fyr 
example, Hendmon, C, Biometrics pp226 (1952); Hmderson, C, Biometrics 31:423 
(1975); Harville, D., The Annals of Statistics 4:384 (1976); Harville, D., J. Amer. 
Statistical Ass 'n 72:320 (1977); and Searle, S., et al.. Variance Components, John 

30 Wiley & Sons, Inc., N.Y. (1992). 
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IV. Quantitative Trait Loci Determined by Linkage of Phenotypic Traits with 
Genetic Markers 



A. Plienotypic Traits Determined by Multiple Genes 

5 Many of the commercially desired traits of domesticated crops are 

detennined by multiple genes. These include such quantitative traits as plant height, 
grain yield, moisture and/or oil content of grain or seed, ear height (in maize), root and 
stalk lodging, and disease and insect resistance. 

Phenotypic traits determined by multiple genes are typically continuous 

10 and follow a bell curve, with the greatest number of plants in a population exhibiting the 
average of the quantitative phenotypic trait This is in comparison with single locus 
Mendelian genetics and its concept of dominant and recessive alleles exhibiting as one of 
two possible phenotypes. 

In addition to the genetic element of complex traits, in breeding plant 

15 populations, environmental dynamics must be taken into account. This is done by 

analyzing a QTL in a variety of populations in a variety of different environments. In an 
altmiate and preferred method, lines from multiple families within a population are 
crossed with tester parents, which have defined genotypes. Progeny from these crosses 
can be evaluated for phenotypic traits of interest in one environmoit or in multiple 

20 environments to detemiine the extent changes in the environment have on expression of 
the quantitative traits. 

B. Genetic Markers 

In the following discussion, the phrase Nucleic acid," *^lynucleotide," 
25 V>lynucleotide sequence" or '^nucleic add sequence" refers to deo^^bonucleotides or 
ribonucleotides and polymers fliereof in either singlo- or double-stranded form. Unless 
specifically stated, the tarn encompasses nucleic acids containing known analogs of 
natural nucleotides which have similar binding properties as the reference nucleic acid 
and are metabolized in a maimer similar to naturally occurring nucleotides. Unless 
30 otherwise indicated, a particular nucleic acid sequoice of this invention also implicidy 
enconq>asses conservatively modified variants tiiereof (ag;. degenerate codon 
substitutions) and complementary sequences and as well as the sequence explicitiy 
indicated. Specifically, degoierate codon substitutions may be achieved by generating 
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sequences in which flie tiiinl position of one or more selected (or all) codons is substituted 
with mixed4>ase and/or deoxyinosine residues (Batzer, et al.^ Nucleic Acid Res. 19:5081 
(1991); Ohtsuka, et al., J. BioL Chem. 260:2605-2608 (1985) : and Rossolini. ei al. MoL 
Cell Probes 8:91-98 (1994)). The term nucleic acid is used interchangeably with gene, 
5 cDNA, and mRNA encoded by a gene. 

To idmtify genetic maikers, labeled oligonucleotides tiiat are 
complementary to the genetic marker are hybridized to the nucleic acid sequences of the 
individual plants. Two single-stranded nucleic acids *Tiybridize** when they form a 
double-stranded duplex. The region of double-strandedness can include the iull-length of 

10 one or both of the single-stranded nucleic acids» or all of one single stranded nucleic acid 
and a subsequence of the other single stranded nucleic acid, or the region of double- 
strandedness can include a subsequence of each nucleic acid. An overview to the 
hybridization of nucleic acids is foimd in Tijssen, Laboratory Techniques in 
Biochemistry and Molecular Biology-Hybridization wrra Nucleic Acid 

15 Probes, Part I, Ch^ter 2 **OvCTview Of Principles Of Hybridization And The Strategy 
Of Nucleic Acid Probe Assays," Elsevier, New York (1993). 

'^Stringent conditions*' in the contejct of nucleic acid hybridization are 
sequence dq>endent and are differrat under different mvironmeatal parameters. An 
extensive guide to the hybridization of nucleic acids is found in Tijssen, supra. 

20 Generally, stringent conditions are selected to be about 5^C lower than the thermal 

melting point (T„0 for the q[>ecific sequ»ce at a defined ionic straigdi and pH. The Tm is 
the temperature (under defined ionic strengdi and pH) at which 50% of the target 
sequence hybridizes to a perfectly matched probe. Hig^y stringent conditions are 
selected to be equal to the Tm point for a particular probe. Nucleic adds which encode 

25 poIypq>tides and do not hybridize to each otiier under stringent conditions are still 

substantially identical if die polypeptides i;i4iich they encode are substantially identical. 
This occurs, e.g.^ when a copy of a nucleic acid is created using the mayiTnum codon 
degeneracy permitted by the genetic code. 

An sample of stringent hybridization conditions for hybridization of 

30 complementary nucleic acids vdudi have more tiban 1 00 conq)lementary residues on a 
filter m a Soutiiem or northern blot is 50% formamide with hq)arin at 42^C, the 
hybridization being carried out overnight An example of stringent wash conditions is a 
0.2 X SSC wash at 65*^0 for 15 minutes {see, Sambrook et al, MOLECULAR CLONmo - A 
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Laboratory Manual (2nd ed.) Vol. 1-3 (1989) (Sambrook, et al) for a description of 
SSC buffer and wash conditions in general). Often the high stringmcy wash is preceded 
by a low stringency wash to remove background probe signal. An example of a low 
stringency wash for a probe with at least about 100 complementary nucleic acids is 2 x 
5 SSC at 40°C for 15 minutes. In general, a signal to noise ratio of 2x (or higher) than that 
observed for an unrelated probe in the particular hybridization assay indicates detection of 
a specific hybridization. 

Genetic Variability 

10 The ability to characterize an individual by its g^ome is due to the 

inherent variability of genetic information. Although DNA sequences which encode 
necessary proteins are well conserved across a species^ there are regions of DNA which 
are non-coding or code for proteins or portions of proteins which do not have a critical 
function and therefore, conservation of nucleic acid sequence is not necessary. These 

1 5 variable regions can be identified by genetic mark^. Typically, genetic markers are 
variable regions of a genome and flie complementary oligonucleotides which bind to 
these regions. In some instances, the presence or absence of binding to a genetic marker 
identifies individuals by their unique nucleic acid sequence. In other instances, a gmetic 
marker is found in all individuals but the individual is idoitified by where, in the genome, 

20 the genetic marker is located. 

The major causes of genetic variabiUty and thus, the major sources of 
gmetic maricers, are addition, deletion and point mutations, recombination events and 
tranqx>sable elCTients widiin the genome of individuals in a plant populatioiL 

Point mutations can be the result of inaccuracy in DNA rq>lication. 

25 During meiosis in the creation of germ cells or in mitosis to oieate daughter cells, DNA 
polymerase ''switches** bases, eith^ transitioDally (i.e, a purine for a purine and a 
pyrimidine for a pyrimidine) or transversionally {le., purine to pyrimidine and ^nce 
versa). The base switch is maintained if the exonuclease function of DNA polymerase 
does not correct the mismatch. At genniiiatiori, or flie next cell division (in clonal cells), 

30 the DNA strand with the point mutation becomes the teoqilate for a coir^>lementary 
strand and the base switch is incorporated into the genome. 

Additions and deletions of nucleic acid sequraces can be due to inaccurate 
recombination events. Recombination occurs when sister chromatids are aligned during 
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cross-ovCT events. One of the DNA strands of the chromatids break and recA protein 
anneals the broken strand to a complementary sequence on the sister chromatid, 

displacing the resident strand. If a single stranded sequence contains regions of 

oligonucleotide repeats, the recA protein may incorrectly use, as a template, another 
5 region of the sisto* chromatid which also contains the same oligonucleotide rq}eats. As 
in the case with point mutations, if the mismatched recombination is not corrected before 
ttie n^t cell division, one of the daughter cells will have an additional region of 
oligonucleotide repeats in its genome and the other will have a deletion in its genome. 

Transposable elements refer to sequences of DNA which have the ability 

10 to move or to jump to new locations within a genome. Two componmts are required for 
transposition: the transposase enzyme which catalyzes transposition and the nucleotide 
sequences present at the end of the tran^son upon which the enzyme acts. Transposons 
are both autonomous and non-autonomous. Autonomous transposons are those which are 
capable of both transposing and catalyzing the transposition of non-autonomous elements. 

IS Examples of autonomous transposons are the Ac elements and Spm transposons isolated 
from maize, all of which have been cloned and are well-described in the art See, for 
example, U.S. Patmt No. 4,732,856 and Gierl, et al.. Plant Mol Biol. 13:261-266 (1989) 
which are incorporated by reference herein. 

Autonomous transposons comprise sequences for transposase and 

20 sequences which are recognized by the transposase enzyme at the ends of the transposon 
(the "^Ds elCTieat"). The sequences for tranq>osase (or ttie tianqx>sase goie) are active 
indq>oident of the gdA sequences, Le» if the end sequences are eliminated, ttie activity of 
flie transposase gene is preserved and tibe enzyme encoding element may dius be used in 
conjunction with a non-autonomous or i>5 element to trigger tianqfosition of the Ds 

IS element The transposase gene is evident in the TslOl and TslOS elements. 

Only the DNA sequences present at the ends of a non-autonomous elemmt 
are required for it to be transpositionally active in the presence of the transposase goie. 
These ends are referred to herein as the "transposon ends" or the ""Ds dement:' See, for 
example, Coupland, et al.. Proa Nat'lAcad. ScL USA 86:9385 (1989), which describes 

30 the sequences necessary for transposition. The DNA sequences internal to the transposon 
ends are non-essetitial and can be comprised of sequences finom virtually any source. 
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Restriction Fragment Length Polymorphisms (RFLP) 
The net result of the mutations and changes in the DNA sequence of 
individuals, as described above, is that they will have different sequences in non-coding 
regions of the genome. When these DNA sequences are digested with restriction 
5 endonucleases which recognize specific restriction sites, the firagments will be of different 
lengths. The resulting fragments are restriction fiagment length polymorphisms. 

The phrase restriction fragment Iragth polymorphism" or ^'RFLP" refos 
to inherited difiTerences in restriction enzyme sites (for example, caused by base changes 
in the target site), or additions or deletions in the region flanked by the restriction enzyme 
1 0 site that result in dififermces in the lengths of the firagmmts produced by cleavage with a 
relevant restriction enzyme. A point mutation will lead to either longer fragments if the 
mutation is within tiie restriction site or shorter fragments if die mutation creates a 
restriction site. Additions and transposable elements will lead to longer fragments and 
deletions will lead to shorter fragments. 
IS An RFLP can be used as a genetic marker in the determination of 

segregation of alleles with quantitative phenotypes. In one embodiment of the invention, 
the restriction fragments are linked to specific phenotypic traits. More specifically, the 
presence of a particular restriction firagment is used to predict the prevalence of a specific 
phenotypic trait 

20 

Amplified Variable Sequences 

In one raibodiment, amplified variable sequences of the plant genome and 
complemoitary nucleic acid probes are used as gen^c markers. The phrase ^'amplified 
variable sequences*' refers to amplified sequences of the plant genome whidi «hibit high 

25 nucleic acid residue variability betwem members of the same spedes. All organisms have 
variable genomic sequences and each organism (with the excqition of a clone) has a 
different set of variable sequences. Once idffltified» the presCTce of a specific variable 
sequence can be used to predict phenotypic traits. Preferably, DNA from the plant scaves 
as a template for ampUfication with primers that flaiik a variable sequel The 

30 variable sequence is amplified by amplification techniques and sequenced. In vitro 
amplification techniques are well known. Examples of techniques sufiScient to direct 
posons of skiU through sudi in vitro anq)lification mediods, including the polymerase 
chain reaction (PCR) the ligase chain reaction (LGR), Qi-replicase amplification and 
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Other RNA polymerase mediated techniques (e.g:, NASBA) are found in Berger & 
Kimmel, Guide to Molecular Cloning Techniques: METHODS IN Enzymology, vol. 152, 

Academic Press, Inc., San Diego, CA (Berger); Sambrook, et al\ and Current 

Protocols in Molecular Biology, F.M. Ausubel et al, eds.. Current Protocols, a joint 
5 venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (1994 
Supplement) (Ausubel), as weU as U.S. Patent No. 4,683,202; PCR Protocols A Gun>E 
TO Methods and Appucations. Innis et al. eds.. Academic Press Inc. San Diego, CA 
(1990) (Innis); Amheim & Levinson (October 1, 1990) C&EN 36-47; Kwoh, et al. Proc 
Nat'lAcacL Sci, USA 86:1173 (1989); Guatelli, a/., Proc Natl Acad. Set USA 

10 87:1874 (1990); Lomell, et al. J. Clin. Chem 35:1826 (1989); Landegren, et al. Science 
241:1077 (1988); Van Brunt, Biotechnology 8:291 (1990); Wu & WaUace, Gene 4:560 
(1989); Bairinger, et al Gene 89:17 (1990) and Sooknanan & Malek, Biotechnology 
13:563 (1995). In^jioved methods of cloning in vitro amplified nucleic acids are 
described in U.S. Pat No. 5,426,039. 

15 Oligonucleotides for use as primers, eg., in in vitro amplification methods 

and for use as nucleic acid sequence probes are typically synthesized chemically 
according to the solid phase phosphoramidite triester method described by Beaucage & 
Caruthers, Tetrahedron Lett, 22:1859-1862 (1981). 

Nucleic acid sequencing techniques are also well known. Commonly used 

20 techniques such as the didcoxy chain temunation method (Sanger, et al., Proc Nat 7 
Acad. ScL USA 74:5463 (1977) and the Maxam and Gilbert method (Maxam & Gilbert, 
Methods in Enzymology 65:499 (1980)) can be used in practicing this invoition. In 
addition, other nucleic acid sequaidng methods, sudi as fluorescence-based techniques 
(U.S. Patent No. 5,171,534), mass spectroscopy (U. S. Patent No. 5.174,962) and 

25 capillary electn>phoresis(U.S.PatCTit No. 5,728,282) can be used. 

Ottier amplification methods include the ligase chain reaction (LCR), the 
transcription-based amplification system (TAS), and the self-sustained sequence 
replication system. 

30 Self-sustained Sequence Replication 

In another ^bodiment of the invention, gmetic markers are identified by 
self-sustained sequence replication. The phrase "self-sustained sequence replication" 
refisrs to a method of nucleic acid amplification using target nucleic acid sequences which 
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are amplified (replicated) exponentially in vitro under isothermal conditions by using 
three mzymatic activities essential to retroviral replication: (1) reverse transcriptase, (2) 
RNase H, and (3) a DNA-d^endent RNA polymerase (Guatelli, et aL Proc, Natl Acad. 
Set USA 87:1874 (1990)). By mimicking the retroviral strategy of KNA repUcation by 
S means of cDNA intermediates, tiiis reaction accumulates cDNA and RNA copies of the 
original targeL 

Substantially isoth^ial means tiiat the temperature may be varied over 
the course of an sq)proximately one hour reaction time within the temperature range of 
about 3TC to 5(f C. Alternatively, one temperature may be selected to carry out the entire 
10 reactiorL Self-sustained sequence replication at 45^C is preferred 

Arbitrary Fragment Length Polymorphisms (AFLP) 
In another embodiment, arbitrary fragment length polymorphisms (AFLP) 
are used as genetic markers (Vos, P., et aL, NucL Acids Res. 23:4407 (1995)). The phrase 
15 "arbitrary fragment length polymorphism" refers to selected restriction fragments which ? 
are ampUfied before or after cleavage by a restriction endonuclease. i The amplification 
step allows easier detection of specific restriction fragments rather than determining the 
size of all restriction fragments and comparing the sizes to a known control. 

AFLP allows the detection of a large number of polymorphic maricers (see, 
20 supra) and has been used for genetic mapping of plants (Becker, J., et oL, MoL Gen. 
Genet. 249:65 (1995); and Meksem, K., et al., MoL Gen. GeneL 249:74 (1995)) and to 
distinguish among closely related bacteria spedes (Huys, G., a aL, Int ? /. Systematic 
BacterioL 46:572 (1996)). 

25 Isozyme Markers 

Other embodiments include identification of isozyme markers and allele- 
spedfic hybridizatiorL Isozymes are multiple forms of enzymes and therefiire are distinct 
from one another in nucleic acid and/or amino acid sequences. Some isozymes are 
multimeric enzymes containing slightly di£fo:ent subunits. Other isozymes are either 

30 multimeric or monomeric but have been cleaved fixim the proenzyme at difierait sites in 
the amino acid sequence. For the purpose oftfais invention, differing isozymes at tfie 
nucleic acid sequence level are to be determined. Primers which flank a variable portion 
of the isozyme nucleic acid sequence are hybridized to the plant genome. The variable 
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region is amplified and sequenced. From the sequence, the different isoscymes are 
determined and linked to phenotypic characteristics. 



AUele-Specific Hybridization (ASH) 
S In yet another embodiment, allele specific hybridization is used to identify 

genetic markers. ASH technology is based on the stable annealing of a short, 
single-stranded, oligonucleotide probe to a completely complementaiy single-strand 
target nucleic acid. The hybridization can then be detected firom a radioactive or 
non-radioactive label on the probe. 

10 ASH markers are polymorphic. For each polymorphism, two or more 

different ASH probes are designed to have identical DNA sequences except at the 
polymorphic nucleotides. Each probe will have exact homology with one allele sequence 
so that the complement of probes can distinguish all the alternative allele sequences. 
Each probe is hybridized against the target DNA. With ^propriate probe design and 

15 stringency conditions, a single-base mismatch between the probe and target DNA will 
prevent hybridization. In this maimer, only one of the alternative probes will hybridize to 
a target sample that is homozygous or homogeneous for an allele (an allele is defined by 
the DNA homology between the probe and target). Samples that are heterozygous or 
heterogeneous for two alleles will hybridize to bofli of two alternative probes. 

20 ASH markers are used as dominant maricers where the presence or absence 

of only one aUele is determined fiom hybridization or lack of hybridization by only one 
probe. The altmiative allele may be inferred from the lack of hybridizatioit 

An ASH probe and target molecules are optionally either RNA or 
denatured DNA; the target molecule(5) is/are any lengtii of nucleotides beyond the 

25 sequence diat is complementary to the probe; the {Hobe is designed to hybridize with 
eiflier strand of a DNA target; the probe ranges in size to confi>nn to variously string^t 
hybridization conditions, etc. 

The polymmse chain reaction Q^CR) allows the target sequence for ASH 
to be amplified fiom low concentrations of nucleic add in relatively small volumes. 

30 Otherwise, the target sequmce fiom genomic DNA is digested witti a restriction 

endonuclease and size separated by gel electrophoresis. Hybridizations typically occur 
with the target sequence bound to the sur&ce of a membrane or, as described m U.S. 
Patent 5,468,613, the ASH probe sequence may be bound to a membrane. 
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Id one aspect of this embodimCTt, utilizing nucleotide alleles and 
polymoiphisms described here, ASH data are obtained by amplifying nucleic acid 
fiagments (amplicons) 6om genomic DNA using PGR, transfeiring the amplicon targ et 



DNA to a membrane in a dot-blot format, hybridizing a labeled oligonucleotide probe to 
5 the amplicon target, and observing flie hybridization dots by autoradiography. 

Simple Sequence Repeats (SSR) 

In yet another basis for providing a genetic linkage m^, SSR takes 
advantage of high levels of di-, tri- or tetra-nucleotide tandem rq)eats within a genome. 
10 Dinucleotide repeats have been reported to occur in the human genome as many as 
50,000 times with n varying fiom 10 to 60 (Jacob, et al.. Cell 67:213 (1991)). The 
dinucleotide repeats have also been found in higher plants (Condit & HubbeU, Genome 
34:66 (1991)). 

Briefly, SSR data is generated by hybridizing primers to conserved regions 
15 of the plant gmome which flank the SSR regioiL PGR is then used to amplify the 
dinucleotide repeats between the primers. The amplified sequences are then 
electrophoresed to determine the size and therefore the numb^ of di-, tri- and tetra- 
nucleotide rq>eats. 



20 High Throughput Screening 

In a one aspect of tiie invention, the determination of genetic marker 
alleles is done by high throug^ut screening. In one onbodiment, high tiuougfaput 
screening involves providing a library of goietic markers including RFLPs, AFLPs, 
iso^mes, specific alleles and variable sequences, including SSR. Such '^libraries'' are 
25 then soreCTed against plant genomes. Once the g^etic marker alleles ofa plant have 
been identified, a link between tiie maik^ allele and a desired phenotypic trait can be 
determined tfarougih statistical associations based on tiie methods desoibed herein. 

ISgh tiiroughqput screening can be performed in many different formats. 
Hybridization can take place in a 96-, 324-, or a 1 024-well format or in a matrix on a 
30 silicon chip or other formats as yet not developed. 

In a well-based format, a dot blot apparatus is used to dq>osit samples of 
firagmented and draatured gmomic DNA on a nylon or nitrocellulose membrane. After 
cross-lmking tiie nucleic acid to the membrane, either through exposure to ultra-violet 
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light if nylon membranes are used or by heat if nitrocelhilose is used, the mmibrane is 
incnibated with a labeled hybridization probe. The labels are incorporated into die nucleic 
acid probes by any of a number of means well known to those of skill in the art The 
membranes are washed extensively to remove non*hybridized probes and the presence of 
S the label on the probe is determined. 

In one embodiment^ a label is simultaneously incorporated during the 
anq)lijBcation procedure in the preparation of the nucleic acid probes. Thus, for example, 
polymerase cham reaction (PGR) with labeled primm or labeled nucleotides provide a 
labeled amplification product In another ^bodiment, transcription amplification lising a 

10 labeled nucleotide (e.g.^ fiuorescdn-labeled UTP and/or CTP) incorporates a label into 
the transcribed nucleic acid probes. 

Detectable labels suitable for use in the present invention include any 
composition detectable by spectroscopic, radioisotopic, photochemical, biochemical, 
immunochemical, electrical, optical or chemical means. Useful labels in the presmt 

15 invention include biotin for staining with labeled streptavidin conjugate, magnetic beads, 
fluorescent dyes (e.g„ fluorescein, Texas red, rhodamine, green fluorescent protein, and 
the like), radiolabels (e.g., ^ ^^S, ^"^C, or ^P), enzymes (eg;, horse radish 
peroxidase, alkaline phosphatase and oth^ commonly used in an ELISA), and 
colorimetric labels such as colloidal gold or colored glass or plastic (eg;, polystyrene, 

20 polypropylene, latex, eta) beads. Patents teaching the use of such labels include U.S. 
Patent Nos. 3,817,837; 3,850,752; 3,939^50; 3,996^45; 4;277,437; 4;i75,149; and 
4,366,241. 

Means of detecting such labels are well known to those of skill in the art 
Thus, for example, radiolabels are detected using photographic film or scintillation 

25 counters and fluorescent markers are detected using a photodetector to detect emitted 
light Enzymatic labels are typicaUy detected by providing the enzyme with a siibstr^ 
and detecting the reaction product produced by the action of tfie enzyme on the substrate, 
and colorimetric labels are detected by simply visualizing the colored label. 

A number of well known robotic systems have hosa developed for higjb 

30 tfarougihput screening, particularly in a 96 well format These systmis include automated 
workstations like the automated synthesis apparatus developed by Takeda Chemical 
hidustries, LTD. (Osaka, Jspm) and many robotic systems utilizing robotic arms (Zymate 
n, Zymark Corporation, Hopkinton, Mass.; Orca, Hewlett-Packard, Palo Alto, Calif) 
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which mimic the manual synthetic operations performed by a chemist. Any of the above 
devices are suitable for use with the preset invention. The nature and implementation of 
modifications to these devices (if any) so that they can operate as discussed herein will be 
qiparent to persons skilled in the relevant art 
S In addition, high throughput screening systems themselves are 

commercially available {see, e.g., Zymark Corp., Hopkinton, MA; Air Technical 
Industries, Mentor, OH; Beckman histruments. Inc. Fullerton, CA; Precision Systems, 
Inc., Natick, MA, etc.). These systCTis typically automate entire procedures including aU 
sample and reagent pipetting, liquid dispensing, timed incubations, and final readings of 
10 the microplate or membrane in detector(s) appropriate for the assay. These configurable 
systems provide high throughput arid rapid start up as well as a higih degree of flexibility 
arid customization. The manufacturers of such systems provide detailed protocols the 
various high throughput. 

15 Solid-Phase Arrays^ 

In one variation of the invention, solid phase arrays are adapted for the 
r^id and specific detection of multiple polymorphic nucleotides;. Typically, a nucleic 
acid probe is linked to a solid support and a target nucleic acid is hybridized to the probe. 
Either the probe, or the target, or bofli, can be labeled, typically with a fluorophore. If the 

20 target is labeled, hybridization is detected by detecting bound fluorescence. If the probe 
is labeled, hybridization is typically detected by quenching of the label by the bound 
nucleic acid If both the probe and the target are labeled, d^ection of hybridization is 
typically performed by monitoring a color shift resulting 6am proximity of the two bound 
labels. 

25 In one embodiment, an array of probes are synthesized on a solid support 

Using chip masking technologies and photoprotective chemistry, it is possible to generate 
ordered arrays of nucleic add probes. These arrays, which are known, e.g., as DNA 
chips," or as very large scale immobilized polymer arrays (**VLSIPS"™ arrays) can 
include millions of defined probe rpgions on a substrate having an area of about ^ ^ to 

30 several cm'. 

The construction and use of solid phase nucleic acid arrays to detect target 
nucleic acids is well described in the literature. See^ Fodor, et aL, Science 251:767 
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(1991); Sheldon, et aL, Clin. Chem: 39(4):718 (1993); Kozal, et aL, Nature Medicine 
2(7):753 (1996) and Hubbell, U.S. PaL No. 5,571,639. See also, Pinkel, ei al., 
PCTAJS95/161S5 (WO 96/17958). In bric^ a combinatorial strategy aUows for the 
syntfiesis of arrays containing a large number of probes using a Tnimmal nimiber of 
5 synthetic steps. For instance, it is possible to synthesize and attach all possible DNA 8- 
mer oligonucleotides (4^ or 65,536 possible combinations) using only 32 chemical 
synthetic steps. In general, VLSIPS™ procedures provide a method of producing 4" 
different oligonucleotide probes on an array using only 4n synthetic steps. 

Light-directed combinatorial synthesis of oUgonucleotide arrays on a glass 

10 sur&ce is performed with automated phosphoramidite chemistry and chip masking 

techniques similar to photoresist technologies in the computer chip industry. Typically, a 
glass surface is derivatized with a silane reagent containing a functional group, eg., a 
hydroxyl or amine group blocked by a photolabile protecting group. Photolysis through a 
photolithogaphic mask is used selectively to expose functional groiq)S which are then 

1 5 ready to react with incoming 5 -photoprotected nucleoside phosphoramidites. The 

phosphoramidites react only with those sites which are illimiinated (and thus exposed by 
removal of the photolabile blocking groiq>). Thus, the phosphoramidites only add to 
those areas selectively exposed fiom the preceding step. These steps are repeated until 
the desired array of sequences have been synthesized on the solid surface. Combinatorial 

20 synthesis of different oUgonucleotide analogues at different locations on the array is 
determined by the pattern of illumination during synthesis and the order of addition of 
coupling reagents. Monitoring of hybridization of targ^ nucleic adds to the array is 
typically p^onned with fluorescence microscopes or laser scanning microscopes. 

Li addition to being able to design, build and use probe arrays using 

25 available techniques, one of skill is also able to order custom-made arrays and array- 
reading devices fommanu&cturersspedalizing in array inanufacture. Forexanq)le, 
Afil^etrix in Santa Clara CA manu&ctures DNA VLSIP™ arrays. 

It will be appreciated that probe design is influmced by the intended 
plication. For example, where several probe-target interactions are to be detected in a 

30 single assay, e.g,, on a single DNA chip, it is desirable to have similar melting 

temperatures for all of the probes. Accordingly, the length of the probes are adjusted so 
that the melting temperatures for all of the probes on flie array are closely similar (it will 
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be £q)preciated that different lengths for different probes may be needed to achieve a 
particular Tm where different probes have different GC contents). Alfliough melting 

temperature is a primary consideration in probe design, other factors are optionally used 

to jEiirtfaer adjust probe construction. 

5 

Capillary Electrophoresis 

In another embodiment, c^illary electrophoresis is used to analyze 
polymorphism. This technique works best when the polymorphism is based on size, for 
example, RFLP and SSR. This technique is described in detail in U.S. Patent Nos. 

10 5,534,123 and 5,728,282. Briefly, c^iUary electrophoresis tubes are filled with the 
separation matrix. The sqjaration matrix contains hydroxyethyl cellulose, urea and 
optionally formamide. The RFLP or SSR samples are loaded onto the c^illary tube and 
electrophoresed. Because of the small amount of sample and separation matrix required 
by capillary electrophoresis, the run times are very short The molecular sizes and 

1 5 therefore the number of nucleotides preset in the nucleic acid sample is determined by 
techniques described herein. 

In a high throughput format, many capillary tubes are placed in a c^illaiy 
electrophoresis apparatus. The samples are loaded onto the tubes and electrophoresis of 
the samples is run simultaneously. See, Matiiies & Huang, Nature 359:167 (1992). 

20 Because the sq)aration matrix is of low viscosity, afier each run, the capillary tubes can 
be emptied and reused. 

V. Integrated Systems 

Because of the great number of possible combinations present in one array, 

25 in one wgecX of the invention, an integrated system such as a conq)uter, software and data 
converting device is used to sorem for genetic maricers. The phrase ''co]xq>uter system" 
in the context of tiiis invention refers to a system in ^fdiich data entering a computer 
corresponds to physical objects or processes external to the compute, e.g., nucleic acid 
sequence hybridization and a process tiiat, witiiin a computer, causes a physical 

30 transfoimation of the input signals to different output signals. In othw words, the input 
data, e.g„ hybridization on a specific region of an array is transformed to ou^ut data, eg:, 
the identification of the sequence hybridized. The process witiiin the computer is a 
program by which positive hybridization signals are recognized by the computer system 
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and attributed to a region of the array. The program then determines which region of the 
airay the hybridized nucleic add sequences are located and the specific nucleic acid 
sequences which hybridize to the probe. 



S VI. Marker Assisted Selection in Plants 

A primary motivation for development of molecular markers in onop 
species is the potential for increased efficiency in plant breeding through marker assisted 
selection ^lAS). After QTL have been identified trough the statistical models 
described above» the corresponding genetic marker alleles can be used to identify plants 

1 0 that contain the desired genotype at multiple loci and would be expected to transfer the 
desired genotype along with the desired phenotype to its progeny. 

The presence and/or absence of a particular genetic marker allele in the 
genome of a plant exhibiting a preferred phenotypic trait is made by any method listed 
above, e.g,, RFLP, AFLP, SSR, amplification of variable sequences, and ASH. If the 

15 nucleic acids 6x>m the plant hybridizes to a probe specific for a desired genetic maiicer, 
the plant can be selfed to create a tme breeding line with the same genome or it can be ' 
crossed with a plant with the same QTL or with other desired characteristics to create a 
sexually crossed F| generation, 

*Tositional gene cloning'' uses the proximity of a genetic marker to 

20 physically define a cloned chromosomal fragment that is linked to a QTL identified using 
the statistical methods h«^. Clones of linked nucleic acids have a variety of uses, 
including as genetic markers for identification of linked QTLs in subsequent marker 
assisted selection (MAS) protocols, and to inq)rove desired properties in recombinant 
plants where e>q>ression of the cloned sequences in a transgCTic plant affects an identified 

25 trait C^rrmion linked sequences wUch are desirably cloned include open reading fir^ 
e.g,, encoding nucleic adds or proteins which provide a molecular basis for an obsenred 
QTL. Ifmaricers are proximal to the open reading fiiame, they xiiay hybridize to a givoi 
DNA clone, ibsrdby identifying a clone on whidi tiie open reading firame is located If 
flanking markers are more distant, a fi:agment containing the opm reading firame may be 

30 identified by constructing a contig of overiapping clones. 

In cotain q[>plications it is advantageous to make or clone large nucleic 
acids to identify nucleic acids more distantly linked to a given mack^, or isolate nucleic 
acids linked to or responsible for QTLs as identified hmiiL It will be s^preciated that a 
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nucleic acid genetically linked to a polymorphic nucleotides optionally resides up to 
about SO centimorgans from the polymorphic nucleic acid, although the precise distance 
will vary depending on the cross-over frequency of the particular chromosomal region. 
Typical distances from a polymorphic nucleotide are in the range of 1-50 centimoigans» 
5 for example, often less than 1 centimorgan, less than about 1-5 centimorgans, about 1-5, 
1, 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 caitimorgans, etc. 

Many methods of making large recombmant RNA and DNA nucleic adds, 
including recombinant plasmids, recombinant lanibda phage, cosmids, yeast artificial 
chromosomes (YACs), PI artificial chromosomes. Bacterial Artificial Chromosomes 

10 (BACs), and the like are known. A general mtroduction to YACs, BACs, PACs and 
MACs as artificiail chromosomes is described in Monaco & Larin, Trends Biotechnol 
12:280-286 (1 994). Examples of appropriate cloning techniques for making large nucleic 
acids» and instructions sufficient to direct persons of skill ttirouglh many cloning exercises 
are also found in Berger, Sambrook, and Ausubel, all supra, 

15 In one aspect, nucleic acids hybridizing to the genetic maricers linked to 

QTLs identified by the above methods are cloned into large nucleic acids such as YACs, 
or are detected in YAC genomic libraries cloned from the crop of choice. The 
construction of YACs and YAC libraries is known. See, Berger, supra^ and Burke, et aL 
Science 236:806-812 (1987). Gridded libraries of YACs are described in Anand, et al., 

20 Nucleic Adds Res. 17:3425-3433 (1989), Anand, a/., ^I«:/«c^ci&/^es. 18:1951-1956 
(1990) and Riley, Nucleic Acids Res. 18(10):2887-2890 (1990) and the references therein 
describe cloning of YACs and related technologies. YAC libraries containing large 
fi:agments of soybean DNA have been constructed. See, Fuhke & Kolchinsl^, CRC 
Press, Boca Raton, FL, pp. 125-308 (1994); Marek & Shoemak^, Soybean GeneL NewsL 

25 23:126-129 (1996); Danish, et al. Soybean Genet. NewsL 24:196-198 (1997). YAC 
libraries for many oth^ commereially important crops are available, or can be 
constructed usmg known techniques. See also^ Ausubel, diaptsr 13 for a description of 
procedures for making YAC libraries. 

Sunilarly, cosmids or oAer molecular vectors such as BAC and PI 

30 constracts are also usefiil for isolating or cloning nucl^c acids linked to genetic markers. 
Cosmid cloning is also known. See, e.g., Ausubel, chapter 1.10.1 1 (siqqplonent 13) and 
the refermces therein. See also^ Ish-Horowitz & Burke, Nucleic Acids Res. 9:2989-2998 
(1981); Murray, Lambda n (Hendrix et al., eds.) pp395-432, Cold Spring Haibor 
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Laboratory, NY (1983); Friscbauf, et al, X Mol Biol 170:827-842 (1983); and Dunn & 
Blattoer, Nucleic Acids Res. 15:2677-2698 (1987), and the references cited therein. 

Construction of BAG and PI hbraries is known; see, e.g., Ashwortfa, et al„ Ami 

Biochem. 224(2):564-571 (1995); Wang, et al. Genomics 24(3):527-534 (1994); Kim, et 
5 fli. Genomics 22(2):336-9 (1994); Rouquier, et al. Anal. Biochem. 217(2):205-9 (1994); 
Shizuya, et al, Proc. Natl Acad. Sci. USA 89(18):8794-7 (1992); Kim, et al. Genomics 
22(2):336-9 (1994); Woo, a/.. Nucleic Acids Res. 22(23):4922-31 (1994); Wang, etal.. 
Plant 3:525-33 (1995); Cai, Genomics 29(2): 413-25 (1995); Schmitt, et al. Genomics 
1996 33(l):9-20 (1996); Kim, et al. Genomics 34(2):213-8 (1996); Kim, et al, Proc. 
10 Nat'lAcad. Sci. USA 13:6297-301 (1996); Pusch, et al. Gene 183(l-2):29-33 (1996); 
and Wang, et al. Genome Res, 6(7):612-9 (1996). Improved methods of in vitro 
amplification to amplify large nucleic acids linked to the polymorphic nucleic acids 
herein are summarized in Cheng, et al. Nature 369:684-685 (1994) and the references 
therein. 

15 In addition, any of the cloning or amplification strategies described herein . 

are usefiil for creating contigs of overk^ping clones, thereby providing overlapping 
nucleic adds which show the physical relationship at the molecular level for goietically 
linked nuddc acids. A common example of this strategy is found in whole organism 
sequracing projects, in which overl^ping clones are sequenced to.provide the entire 

20 sequence of a chromosome, in this procedure, a library of the organism's cDNA or 

genomic DNA is made according to standard procedures described, eg., in the references 
above. Individual clones are isolated and sequenced^ and ovorlapping sequrace 
information is ordered to provide the sequence of the organism. See also. Tomb, et al.. 
Nature 388:539-547 (1997) describing the whole genome random sequencing and 

25 assCTibly of the complete genomic sequmce of Helicobacter pylori; Fldschmann, et al.. 
Science 269:496-512 (1995) describing whole genome random sequencing and assembly 
of die complete Haemophilus influenzae genome; Fraser, et al. Science 270:397-403 
(1995) desmbing whole genome random sequencing and assembly of the complete 
Mycoplasma genitalium genome and Bult» et al.. Science 273:1058-1073 (1996) 

30 describing whole genome random sequencing and assonbly of the complete 

Methanococcus jannaschii genome. Recently, Hagiwara and Curtis, Nudeic Adds Res. 
24(12):2460-2461 (1996) developed a 'long distance sequencer' PCR protocol for 
generating overlapping nucldc adds from very large clones to &cilitate sequencing, and 
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mettiods of amplifying and tagging the overl^ping nucleic acids into suitable sequencing 
templates. The methods can be used in conjunction with shotgun sequencing techniques 

to improve the eflSciency of shotgun methods typically used in whole organism 

sequencing projects. As applied to the present invention, the techniques are useful for 
S idratifying and sequencing genomic nucleic acids genetically linked to the QTLs as well 
as "candidate**genes responsible for QTL e3q>ression as identified by the mefliods herein. 

In another embodiment, Fi clonal plants can be grown from cells of the 
selected plant In yet another embodiment, the allelic sequences tiiat comprise a QTL can 
be cloned and inserted into a transgenic plant. Methods of creating transgenic plants are 
10 well known in the art and are described in brief below. 



VII. Transgenic Plants 

A* Making Transgenic Plants 

Nucleic acids derived fix>m those linked to a QTL id^tified by the 
15 statistical methods herein are introduced into plant cells, either in culture or in organs of a 
plant, eg:, leaves, stems, fiuit, seed, etc. The expression of natural or synthetic nucleic 
acids can be achieved by operably linking a nucleic acid of interest to a promoter, 
incorporating the construct into an «pression vector, and introducing the vector into a 
suitable host cell 

20 Typical vectors contain transcription and translation terminators, 

transcription and translation initiation sequences, and promoters useful for regulation of 
die expression oftfae particular nucleic add The vectors optionally comprise generic 
e}q>ression cassettes containing promoter, gene, and tominator sequences, sequences 
permitting replication of flie cassette in oikaryotes, or prokaiyotes, or botii, (eg:, shuttie 

25 vectors) and selection maikm for both prokaryotic and eukaryotic systems. Vectors are 
suitable for replication and integration in prokaiyotes, eukaiyotes, or preferably both. 
See, Giliman & Smith, Gene 8:81 (1979); Rob^, et aL, Nature, 328:731 (1987); 
Schneider, et al.. Protein Expr. Purif, 6435:10 (1995); Berger & Kimmel; Sambrook and 
AusubeL 



B. Qoning of QTL Afleiic Sequences into Bacterial Hosts 

Bacterial cells can be used to increase the number of plasmids containing 
the DNA constructs of this invention. The bacteria are grown to log phase and the 
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plasmids within the bacteria are isolated by a variety of methods known in the art (see, for 
instance, Sambrook). In addition, a plethora of kits are commercially available for the 
purification of plasmids fiom bacteria. For ttieir proper use, follow flie manufacturer's 
instructions (see, for example, EasyPrep^, FlexiPrep™, both Scorn Pharmacia Biotech; 
5 StrataClean™, finom Stratagene; and, QIAexpress™ Expression System, Qiagen). The 
isolated and purified plasmids can then be further manipulated to produce other plasmids, 
used to transfect plant cells or incorporated into Agrobacterium tumefaciens to infect 
plants. 

The in vitro dehvery of nucleic acids into bacterial hosts can be to any cell 
10 grown in culture. Contact between ttie cells and the genetically engineered nucleic acid 
constructs, when carried out in vitrOy takes place in a biologically compatible medium. 
The concentration of nucleic acid varies widely depending on the particular application, 
but is generally between about 1 iM and about 10 mM. Treatment of the cells with the 
nucleic acid is generally carried out at physiological temperatures (about 37^C) for 
IS periods of time of fiom about 1 to 48 hours, but preferably of finom about 2 to 4 hours. 

Alternatively, the nucleic acid operably linked to the promoter to form a 
fusion gene can be expressed in bacteria such as E. coli and its gene product isolated and 
purified. There are several well-known methods of introducing nucleic adds into 
bactmal cells, any of which may be used in the present invmtion. These include: fusion 
20 of the recipient cells with bacterial protoplasts containing fiie DNA, electnq>oiation, 
projectile bombardment, and infection with viral vectors, etc 

C Transfecting Plant Cells 

Preparation of Recombinant Vectors 

25 To use isolated sequences in the above tedmiques, recombinant DNA 

vectors suitable for transformation ofplantceUs are prq>ared. Techniques for 
transfoiming a wide variety of higher plant i^ecies are well known and described in the 
technical and scientific literature. See, for exanq[>le, Weising, et al., Ann. Rev, Genet, 
22:421-477 (1988). A DNA sequmce coding for the desired polypeptide, for example, a 

30 cDNA sequmce encoding a full length protein, will preferably be combined witti 
transOTptional and translational initiation regulatory sequences which will direct the 
transcription of the sequence fipom the gene. 
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Promoters can be identified by analyzing the 5' sequences iq[>stream of the 
coding sequence of an allele associated with a QTL. Sequences characteristic of 

promoter sequences can be used to identic the promoter. Sequences controlling 

eukaryotic gene e3q>zession have been ejctqtisively studied. For instance, promoter 
S sequCTce elements include the TATA box consensus sequence (TATAAT)» which is 
usually 20 to 30 base pairs iq>streamofthe transcription start site. In most instances the 
TATA box is required for accurate transcription initiation. In plants^ furth^ upstream 
from the TATA box, at positions -80 to -100, there is typically a promoter element witii a 
series of adenines surrounding the trinucleotide G (or T) N G. J. Messing, et aL, in 
10 Genetic Emgdieerino in Plants, pp. 221-227 (Kosage, Meredith and Hollaender, eds. 
(1983)). 

A number of methods are known to those of skill in the art for identifying 
and characterizing promoter regions in plant genomic DNA {see, e.g., Jordano, et aL, 
P/anrCe// 1:855-866 (1989); Bustos,e/ a/., P/on^CW/ 1:839-854 (1989); Green, e/ a/., 

15 £MBO/. 7:4035-4044(1988); Meier, er a/., fteii/Cfe// 3:309-3 16 (1991); and Zhang, ^ 
fl/.. Plant Physiology 110:1069-1079 (1996)). 

In construction of recombinant expression cassettes of the invention, a 
plant promoter fragment may be ^ployed which will direct expression of the goie in all 
tissues of a regenerated plant Such promoters are referred to herein as ''constitutive" 

20 promoters and are active under most environmoital conditions and states of development 
or cell differentiation. Examples of constitutive promoters include the cauliflower mosaic 
virus (CaMV) 35S transcription initiation region, the ubiquitin pnnnoter, the 1 - or 2'- 
promoter derived from T-DNA of Agrobacterium tunutfadens^ and o&er transcription 
initiation r^ons from various plant gjsaes known to those of skilL 

25 Alternatively, the plant promote may direct e3q)ression of the 

polynucleotide of tiie invention in a ^edfic tissue (tissue-specific promoters) or may be 
othermse under more precise environmental control (inducible promoters). Examples of 
tissue-specific promoters imder developmental control include promoters that initiate 
transcription only in certain tissues, such as fruit, seeds, or flowers. As noted above, the 

30 tissue specific E8 promoter &om tomato is particularly useful for directing gene 

expression so ttiat a desired g»e product is located in fruits. Other suitable promoters 
include tiiose firom genes mcoding embryonic storage proteins. Examples of 
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mviionmental conditions that may aifect transcription by inducible promoters include 
anaerobic conditions, elevated temperature, or the presence of light 

If proper polypeptide expression is desired, a polyadenylation region at the 
3 -end of the coding region should be included The polyadenylation region can be 
S derived from the natural gene, fiom a variety of other plant genes, or firom T-DNA. 

The vector comprising the sequences (eg:, promoters or coding regions) 
fiom genes of the invention will typically comprise a marker gene which confers a 
selectable phenotype on plant cells. For example, the marker may encode biodde 
resistance, particulariy antibiotic resistance, such as resistance to kanamycin, G41 8, 
10 bleomycin, hygromycin, or hobicide resistance, such as resistance to chlorosluforon or 
glufosinate. 



Introduction of the Nucleic Acids into Plant Cells 

The DNA constructs of the invention are introduced into plant cells, either 
15 incultureorintbeorgansof a plant by a variety of conventional techniques. For 
example, the DNA construct can be introduced directly into the genomic DNA of the 
plant cell using techniques such as electroporation and microinjection of plant cell 
protoplasts, or the DNA constructs can be introduced directly to plant cells using ballistic 
methods, such as DNA particle bombardment. Alternatively, the DNA constructs are 
20 combined with suitable T-DNA flanking rogions and introduced into a conventional 
Agrobacterium tumefaciens host vector. The virulence functions of the Agrobacterium 
tumefadens host directs the insertion of the construct and adjacent marker into the plant 
cell DNA when the cell is infected by the bacteria. 

Microinjection techniques are known in the art and well described in the 
25 scientific and patent lit^ture. The introduction of DNA constructs using polyethylene 
glycol precipitation is desoibed in Paszkowski, et al., EMBOJ, 3:2717 (1984). 
Electroporation techniques are described in Fromm, et al.. Proa Nat 7 Acad ScL USA 
82:5824 (1985). Ballistic transformation techniques are described in Klein, ei al.. Nature 
327:70-73(1987). 

30 Agrobacterium /iim^^Zicieiu-mediated transformation techniques, mcluding 

disarxning and use of binary vectors, are also well described in the scientific literature. 
See^ for CTcample Horsch, et aL, Science 233:496-498 (1984), and Fraley, et aL, Proc. 
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Natl Acad. Scl USA 80:4803 (1983). Agrobacterium-mediated transfonnation is a 
preferred mefliod of transfonnation of dicots. 



Generation of Transgenic Plants 

5 Transformed plant cells which are derived by any of the above 

transfonnation techniques can be cultured to regenerate a whole plant which possesses the 
transformed genotype and thus the desired phenotype. Such regoieration techniques rely 
on manipulation of certain phytohormones in a tissue culture growth medium, typically 
relying on a biodde and/or herbicide marker which has been introduced together with the 

1 0 desired nucleotide sequences. Plant regeneration firom cultured protoplasts is described in 

Evans, et At/., PROTOPLASTS ISOLATION AND CULTURE, HANDBOOK OF PLANT CELL 

Culture, pp. 124-176, Macmillian Publishing Company, New York, (1983); and 
Binding, REGENERATION OF Plants, Plant Protoplasts, pp. 21-73, CRC Press, Boca 
Raton, (1985). Regeneration can also be obtained fiom plant callus, explants, somatic 

15 embryos (Dandekar, et al„ 1 Tissue Cult Metk 12:145 (1989); McGranahan, et al. Plant 
Cell Rep, 8:512(1990)), organs, or parts thereof. Such regeneration techniques are 
described generaUy in Klee, et aL, Ann. Rev. of Plant Phys. 38:467-486 (1987). 

One of skill will recognize tiiat after the ^ression cassette is stably 
incorporated in transgenic plants and conlBrmed to be operable, it can be introduced into 

20 other plants by sesnial crossing. Any of a number of standard breeding techniques can be 
used, dq)ending upon the species to be crossed. 

It is understood diat the embodimoits described hordn are for illustrative 
purposes only and tiiat various modifications or changes in ligjit thereof will be suggested 
to persons skilled in the art and are to be included within the spirit and purview of diis 

25 application and tfie scope of the sqppended claims. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference for all purposes. 
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WHAT IS CLAIMED IS: 

1 1 . A method of identifying quantitative trait loci in a mixed defined 

2 plant population comprising multiple plant families, the meBiod cottq)rising: 

3 i) quantifying a phenotypic trait aax>ss lines sampled &om die 

4 mixed population, thereby providing a quantified population phenotype; 

5 ii) identifying at least one genetic mailcer associated with the 

6 distribution of phenotypic trait by screening a s^ of markers for associations with the 

7 quantified population phenotype; and 

8 iii) identifying the quantitative trait loci based on tfie 

9 association of the phenotypic trait and genetic maricer. 

1 2. The method of claim 1, wherein the mixed plant population 

2 consists of diploid plants. 

1 3. The method of claim 1, who^ the mixed plant population 

2 consists of inbred plants. 

1 4. The method of claim 1 , wherein the mixed plant population 

2 consists of hybrid plants. 

1 5. The method of claim 1 , wherein the phenotypic trait of the progeny 

2 of one line 6om one &inily in the plant population is evaluated in topcross combination 

3 with tester parents. 

1 6. The method of claim 1, wherein the plant population is selected 

2 fit>m maize, soybean, sorg^um^ ^eat, sunflower, or canola. 

1 7. The method of claim 6, whoiein the plant population is maize. 

1 8. The method ofclaim 7, wherein the plant population consists of the 

2 species Zea mays. 



1 9. The method of claim 1 , wherein the phenotypic trait is selected 

2 jfrom yield, grain moisture, grain oil, root lodging, stalk lodging, plant heigiht, ear height, 

3 disease resistance, or insect resistance. 
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1 10. The method of claim 1 , wherein at least two genetic markers are 

2 identified in association with the quantified trait loci. 

i 1 1 . The method of claim 1 . wherein genoty pin g of g enetic markers 

2 used for association with the phenotypic trait is done by high throughput screening. 

1 12. The method of claim 1 » wherein the genetic markers are restriction 

2 fragment length polymorphisms (RFLPX isozyme markers, allele specific hybridization 

3 (ASH), amplified variable sequences of the plant gmome, self-sustained sequmce 

4 replication, simple sequence repeat (SSR), or aibitraiy fragment Imgfh polymorphisms 

5 (AFLP). 

1 13. The method of claim 12, wherein the genetic markers are selected 

2 by allele specific hybridization. 

1 14. he method of claim 1, wherein the association of the phenotypic 

2 trait and the goietic markers is determined by ^plying a statistical model. 

1 IS. The method of claim 14, wh^ein the model comprises parametm 

2 wifli fixed efifects for QTL and family backgrounds. 

1 16. The method of claim 14, wherein the model comprises parameters 

2 with random effects for QTL and family backgrounds. 

1 17. The method of claim 1 4, wh^^in the model comprises parameters 

2 with mixed effects for QTL and family backgrounds 

1 18. The meOiod of claim 1, furttier comprising selecting for a desired 

2 phenotypic trait in progeny of aplant breeding populatioiL 

1 19. The method of claim 1 8, wherein flie plant population consists of 

2 diploid plants. 

1 20. The method of claim 1 8, wherein the plant population consists of 

2 hybrid plants. 



1 

2 



21. 

inbred plants. 



The mefliod of claim 18, wherein the plant population consists of 
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1 22. The method of claim 18, wherein the plant population is maize» 

2 soybean, sorghum, wheat, sunflower, or canola. 

J 23. The metiiod of claim 22> wherein flie plant po pulation is maize. 

1 24. The method of claim 23, wherein the plant population consists of 

2 the species Zea mays, 

1 25. The method of claim 1 8, wherein the phenotypic trait is yield, grain 

2 moisture, grain oil, root lodging, stalk lodging, plant heigiht, ear height, disease resistance, 

3 or insect resistance. 

1 26. The method of claim 18, wherein at least two genetic markers are 

2 identified. 

1 27. The method of claim 1 8, wherein genotypes of the identified 

2 markers is determined by high throughput screening. 

1 28. The method of claim 18, wherein the association of phenotypic 

2 traits and genetic markers is determined by ^plying a statistical model. 

1 29. The mediodofclaim 28, wherein tiie model comprises parameters 

2 with fixed effects for QTL and fimiily background 

1 30. The method ofclaim 28, wherdn the model comprises parameters 

2 widi random efiects for QTL and family backgrounds 



1 31 The method of claim 28, wherein the model conqirises parameters 

2 with mixed effects for QTL and fiunily backgrounds. 

1 32. 32. The method of claim 1 8, further comprising 

2 marker assisted selection of plants with a desired phenotype by detecting and selecting for 

3 die quantitative trait loci idmtified in step (iii). 



1 33. A method ofselecting plants with a desired phenotype by marker 

2 assisted selection of genetic markers associated with a quantitative trait loci identified by 

3 the method of claim 1. 
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1 34. A plant selected by the method of claim 1 . 

1 35. The method of claim 1, further compiisiiig cloning a nucleic acid in 

2 linkage disequilibrium with an identified trait locus: and transducing the nucleic acid into 

3 a plant 

1 36. The method of claim 35, wherein the nucleic acid is introduced into 

2 a plant in an expression cassette comprising a promoter operably linked to the nucleic 

3 acid. 

1 37. he method of claim 35, wherein the plant is sexually crossed with a 

2 second plant 

1 38. The transgenic plant made by the method of claim 35. 

1 39. The transgenic plant of claim 38, which is a member of the species 

2 Zeamays, 
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